Code
library(ggplot2)
library(dplyr)
# Subsetting for clearer visualization (random 1000 rows to prevent overplotting)
set.seed(123)
diamonds_subset <- diamonds %>% sample_n(1000)[cite_start]This e-portfolio documents my work for the SQ4012 Data Visualisation module[cite: 89]. It demonstrates my ability to plan, execute, and refine data visualisations using R and the Grammar of Graphics. [cite_start]The portfolio is structured into four key tasks illustrating the journey from theoretical design to dynamic animation[cite: 116].
[cite_start]Below is a structured plan for the graphic implemented in Task 3. This specification acts as a blueprint, ensuring the visualization is well-defined before any code is written[cite: 119].
diamonds dataset (built-in to ggplot2).carat (Quantitative): The weight of the diamond.price (Quantitative): The price in US dollars.cut (Ordinal): Quality of the cut (Fair, Good, Very Good, Premium, Ideal).clarity (Ordinal): A measurement of how clear the diamond is.carat to show the independent variable (weight).price to show the dependent variable (cost).cut. [cite_start]This uses a discrete color scale to visually distinguish between quality tiers within the scatter plot[cite: 124].geom_point: Used to visualize the individual data points, allowing us to see the distribution and density of diamonds[cite: 124].geom_smooth: A Loess smoothing line is added to summarize the trend between weight and price.clarity. [cite_start]This creates “small multiples” to allow comparison across different clarity grades without overcrowding a single plot[cite: 124].Writing a specification (SPEC) like the one above is a critical step in reproducible data science. [cite_start]By separating the design (the logical mapping of data to visuals) from the execution (the actual R code), we ensure that the visualization plan is robust[cite: 120].
If I were required to switch tools—for example, from R (ggplot2) to Python (matplotlib) or Tableau—this SPEC would remain valid because it describes the grammar of the graphic rather than the syntax of the tool. Furthermore, explicitly planning scales and coordinates beforehand forces the designer to catch potential issues, such as overplotting or colorblind-unfriendly palettes, before wasting time on coding errors.
[cite_start]The “Grammar of Graphics,” introduced by Leland Wilkinson and popularized by Hadley Wickham, provides a consistent framework for describing all statistical graphics[cite: 123]. Rather than thinking of charts as a fixed “menagerie” (e.g., a “bar chart” vs. a “pie chart”), the Grammar of Graphics allows us to build charts component by component. This essay defines the key elements of this grammar.
Aesthetics describe how data is perceived by the human eye. [cite_start]In the grammar, we “map” data variables to aesthetic attributes such as x-position, y-position, colour, size, and shape[cite: 124]. For example, in a scatter plot, we might map engine size to the x-axis (position) and fuel efficiency to the y-axis (position). Aesthetics are the bridge between raw numbers and visual properties.
[cite_start]Scales control the mapping from data space to aesthetic space[cite: 124]. If a variable ranges from 0 to 1000, the scale determines how that range fits onto a computer screen (e.g., 0 pixels to 500 pixels). Scales also handle transformations, such as converting linear data into a logarithmic scale to better visualize exponential growth, or mapping categories (like “Male/Female”) to specific colors (like “Blue/Red”).
[cite_start]Geometries (or “geoms”) represent the actual geometric objects drawn on the plot to represent data[cite: 124]. Common geometries include points (for scatter plots), bars (for bar charts), and lines (for time series). A single plot can contain multiple geometries; for instance, a chart might layer a geom_point (showing raw data) underneath a geom_smooth (showing a statistical trend line).
[cite_start]The coordinate system defines the physical space in which the data is drawn[cite: 124]. The most common system is Cartesian (defined by x and y axes at right angles). However, the grammar allows for alternative systems like Polar coordinates. A pie chart, for example, is mathematically identical to a stacked bar chart, but plotted in polar coordinates rather than Cartesian ones.
Guides are the tools that allow the viewer to “read” the plot back into data. [cite_start]These include axes (which decode position) and legends (which decode color, size, or shape)[cite: 124]. Faceting is the process of splitting a dataset into subsets and creating a matrix of small, similar plots for each subset. This is powerful for comparing categorical variables without the visual clutter of overlapping points.
By understanding these components, we gain the ability to construct arbitrarily complex graphics. [cite_start]We are not limited to pre-set Excel templates; we can mix and match geometries, switch coordinate systems, and adjust scales to reveal the exact story hidden within the data[cite: 124].
[cite_start]For this task, I am using the diamonds dataset to create a complex visualization that employs multiple aesthetics and faceting[cite: 126].
library(ggplot2)
library(dplyr)
# Subsetting for clearer visualization (random 1000 rows to prevent overplotting)
set.seed(123)
diamonds_subset <- diamonds %>% sample_n(1000)This plot embodies the Grammar of Graphics by layering points and trend lines, mapping color to a categorical variable (cut), and faceting by another (clarity).
ggplot(diamonds_subset, aes(x = carat, y = price)) +
# Geometry 1: Points with color mapped to Cut
geom_point(aes(color = cut), alpha = 0.6, size = 2) +
# Geometry 2: Smooth trend line
geom_smooth(method = "loess", color = "black", se = FALSE, linewidth = 0.5) +
# Faceting: Split the chart by Clarity
facet_wrap(~clarity, nrow = 2) +
# Scales
scale_color_brewer(palette = "Dark2") +
scale_y_continuous(labels = scales::dollar_format()) +
# Guides
labs(
title = "Complex Relationship: Price vs Carat",
subtitle = "Faceted by Clarity (Subset of n=1000)",
x = "Carat (Weight)",
y = "Price (USD)",
color = "Cut Quality"
) +
theme_minimal()To demonstrate the flexibility of the grammar, I have transformed a summary of the same dataset into Polar Coordinates. This transforms a standard stacked bar chart into a “Coxcomb” or radial chart.
ggplot(diamonds, aes(x = cut, fill = clarity)) +
geom_bar(position = "fill") +
coord_polar(theta = "y") +
labs(
title = "Proportion of Clarity within Cuts",
subtitle = "Polar Coordinate Variation",
x = "",
y = ""
) +
theme_void() +
scale_fill_viridis_d()The difference in perception between these two variants is significant.
The Cartesian scatter plot (Figure 1) allows for precise analytical comparison. We can easily see that as Carat increases, Price increases, and we can compare the slope of this increase across different Clarity facets. The axes provide a clear reference frame.
In contrast, the Polar coordinate chart (Figure 2) is aesthetically pleasing and highlights the cyclic or “whole” nature of the data, but it distorts perception. Humans are generally better at comparing linear lengths (bars) than angles or arc lengths (polar segments). While the Polar chart effectively shows that ‘Ideal’ cuts have a diverse distribution of clarity, it makes it difficult to read exact proportions compared to a standard stacked bar chart.
I am using the {gganimate} package to visualize the gapminder dataset over time. As per the assignment requirement, I have added a “twist” by focusing on a specific global event: The 1994 Genocide in Rwanda.
library(gganimate)
library(gapminder)
# Create a column to highlight Rwanda specifically
gapminder_twist <- gapminder %>%
mutate(highlight = ifelse(country == "Rwanda", "Rwanda", "Other")) %>%
arrange(highlight) # Ensure Rwanda plots on topThe animation below tracks Life Expectancy vs GDP. A static chart might simply show Rwanda as an outlier, but the animation reveals when and how fast the tragedy occurred.
p <- ggplot(gapminder_twist, aes(gdpPercap, lifeExp, size = pop, color = highlight)) +
geom_point(alpha = 0.7) +
# Highlighting Rwanda in Red, others in neutral Grey
scale_color_manual(values = c("Other" = "grey80", "Rwanda" = "red")) +
scale_size(range = c(2, 12), guide = "none") +
scale_x_log10() +
labs(title = 'Year: {frame_time}',
subtitle = 'Highlighting the impact of the 1994 crisis in Rwanda (Red)',
x = 'GDP per capita',
y = 'Life Expectancy') +
theme_minimal() +
# Animation transitions
transition_time(year) +
ease_aes('linear')
# Render the animation
animate(p, renderer = gifski_renderer(), duration = 15, fps = 20, end_pause = 10)Using animation adds a dimension of “storytelling” that static graphics lack. In this visualization, the “twist” is the sudden, plummeting drop of the red dot (Rwanda) in the early 1990s.
In a static plot, this data point might just appear as a low outlier mixed in with other data. However, the animation forces the audience to witness the change. We see the country progressing normally, and then suddenly crashing, before recovering. This movement elicits a stronger emotional and cognitive response, making the pattern of the event (the 1994 genocide) undeniable and visually distinct from the general global trend of increasing health and wealth.